Introduction

The goal of this project was to use data, given by Washington state, on local watersheds and salmon runs to understand use of the ggmaps package in R. Accompanying objectives for this project are:

Data Collecting

Searching for this data was actually quite simple. I was able to acquire this data through the Washington State Department of Ecology. However, upon attempting to find the link for the data request page, I was disappointed to find the link I used was taken down. As of now I am unaware where that request page is now located.

Data setup

The data set I obtained was in an excel format, which was thenonverted to a .csv file through excel’s export function. From there very little cleanup was necessary to effectively use the data.

setup required a few packages and an addition to the data set to change any blank spaces with NAs.

library(plyr)
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.1       ✔ purrr   0.3.2  
## ✔ tibble  2.1.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.3       ✔ stringr 1.4.0  
## ✔ readr   1.3.1       ✔ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::arrange()   masks plyr::arrange()
## ✖ purrr::compact()   masks plyr::compact()
## ✖ dplyr::count()     masks plyr::count()
## ✖ dplyr::failwith()  masks plyr::failwith()
## ✖ dplyr::filter()    masks stats::filter()
## ✖ dplyr::id()        masks plyr::id()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ dplyr::mutate()    masks plyr::mutate()
## ✖ dplyr::rename()    masks plyr::rename()
## ✖ dplyr::summarise() masks plyr::summarise()
## ✖ dplyr::summarize() masks plyr::summarize()
library(ggmap)
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(stringi)
library(ggplot2)
library(dplyr)

df1 = read.csv("Final_Project_files/Wasden.csv", na.strings = c("","NA"))

Initial Cleanup

Some rows needed to be removed because that data was placed in completely different columns.

Other sections for cleanup required very little time, such as removing NAs, filtering out particularlly big outliers, and difficult to identify streams.

IT SHOULD BE NOTED: I made a mistake when doing cleanup during this step. The na.omit() command removes all NAs within the data set and their accompanying row. This ended up drastically reducing the entries in the data set. However, upon looking back during my coordinate search it was a bit of a relief to only locate 300 individual streams rather than over 700, which would have taken me much longer.

df = df1[c(2:4,7,11:15)]
df = na.omit(df)

dfINDX = df %>%
  filter(Species == "INDX")

dfFOOT = df %>%
  filter(Species == "FOOT")

dfSPOT = df %>% 
  filter(Species == "SPOT")

dfSUPP = df %>% 
  filter(Species == "SUPP")

dfREMOVE.1 = join(dfINDX, dfFOOT)
## Joining by: WRIA_Num, StreamName, StreamCatalogCode, Flow, Species, RunYear, Live_Count, Dead_Count, PercentSeen
dfREMOVE.2 = join(dfREMOVE.1, dfSPOT)
## Joining by: WRIA_Num, StreamName, StreamCatalogCode, Flow, Species, RunYear, Live_Count, Dead_Count, PercentSeen
dfREMOVE = join(dfREMOVE.2, dfSUPP)
## Joining by: WRIA_Num, StreamName, StreamCatalogCode, Flow, Species, RunYear, Live_Count, Dead_Count, PercentSeen
dfFINAL.1 = anti_join(df, dfINDX)
## Joining, by = c("WRIA_Num", "StreamName", "StreamCatalogCode", "Flow", "Species", "RunYear", "Live_Count", "Dead_Count", "PercentSeen")
dfFINAL.2 = anti_join(dfFINAL.1, dfFOOT)
## Joining, by = c("WRIA_Num", "StreamName", "StreamCatalogCode", "Flow", "Species", "RunYear", "Live_Count", "Dead_Count", "PercentSeen")
dfFINAL.3 = anti_join(dfFINAL.2, dfSPOT) 
## Joining, by = c("WRIA_Num", "StreamName", "StreamCatalogCode", "Flow", "Species", "RunYear", "Live_Count", "Dead_Count", "PercentSeen")
dfFINAL = anti_join(dfFINAL.3, dfSUPP)
## Joining, by = c("WRIA_Num", "StreamName", "StreamCatalogCode", "Flow", "Species", "RunYear", "Live_Count", "Dead_Count", "PercentSeen")
dfCOORD = dfFINAL[-grep('^Unnamed',df$StreamName),]

dfCOORD.1 = dfCOORD %>%
  filter(PercentSeen > 100)

dfCOORD = anti_join(dfCOORD, dfCOORD.1)
## Joining, by = c("WRIA_Num", "StreamName", "StreamCatalogCode", "Flow", "Species", "RunYear", "Live_Count", "Dead_Count", "PercentSeen")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "1", "Nooksack")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "2", "San Juan")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "3", "Lower Skagit-Samish")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "4", "Upper Skagit")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "5", "Stillaguamish")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "6", "Island")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "7", "Snohomish")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "8", "Cedar/Samish")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "9", "Duwamish/Green")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "10", "Puyallup/White")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "11", "Nisqually")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "12", "Chambers-Clover")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "13", "Deschutes")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "14", "Kennedy-Goldsborough")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "15", "Kitsap")
dfCOORD$WRIA_Num =  str_replace_all(dfCOORD$WRIA_Num, "16", "Skokomish-Dosewallips")
dfCOORD$WRIA_Num = str_replace_all(dfCOORD$WRIA_Num, "NooksackStillaguamish", "Stillaguamish")



dfCOORD$RunYear = as.numeric(as.character(dfCOORD$RunYear))